Designing a Data Source Discovery App - Part 15: A Primarily Data Insert Only Approach

by DL Keeshin


March 24, 2025


images/future_kds_discovery_erd_20241010.png
See larger

Introduction

Last time I talked about using LLMs for generating test data. Today I want to discuss how the kDS Discovery App follows an insert-only with update exceptions approach. This means that instead of modifying existing records, new records are inserted whenever possible. This approach is beneficial because:

  • Better Data Integrity: By maintaining historical records instead of altering existing data, you ensure a clear audit trail and accountability.
  • Cheaper Storage Costs: Modern databases are optimized for inserts, and storing additional records is often more cost-effective than handling complex update operations.
  • Faster Performance: Inserts generally perform better than updates, which require additional locking and transactional overhead. A SQL UPDATE command is effectively two operations: a DELETE followed by an INSERT, making it more resource-intensive than a simple INSERT.

However, there are times when updates are necessary—such as not knowing an interview start or end date range when generating an interview. This blog post addresses a well-structured approach to handling these update exceptions using triggers and functions in PostgreSQL.

Schema Overview

The client.interview table stores interview-related data, including details such as the role ID, request date, model ID, cost, frequency, and administrative information. To keep a historical log of any modifications, the client.interview_history table is used.


Table client.interview {
  interview_id uuid [pk, not null, default: `gen_random_uuid()`]
  role_id uuid [not null]
  request_date date
  model_id uuid [not null]
  cost numeric(6,2)
  frequency_ varchar(32)  [not null]
  interview_admin varchar(96)
  approval_date date
  start_date date
  end_date date
  create_date timestamptz [not null, default: `CURRENT_DATE`]
  modified_date timestamptz
  created_by varchar(96)
  modified_by varchar(96)
  author_id uuid
  source_ varchar(96) [not null]
  title_ varchar(64)
  project_id uuid
}	

The Interview History Table

The client.interview_history table is a mirror of the client.interview table with additional columns:

  • history_id: A unique identifier for each historical record.
  • change_type: Specifies whether the record was created or updated.
  • change_reason: Stores the reason for modification.

Table client.interview_history  {
history_id uuid [pk, not null, default: `gen_random_uuid()`])
  interview_id uuid [not null]
  role_id uuid [not null]
  request_date date
  model_id uuid [not null]
  cost numeric(6,2)
  frequency_ varchar(32) [not null]
  interview_admin varchar(96)
  approval_date date
  start_date date
  end_date date
  create_date timestamptz [not null]
  modified_date timestamptz [not null, default: `CURRENT_DATE`]
  created_by varchar(96)
  modified_by varchar(96) [not null]
  author_id uuid
  source_ varchar(96) [not null]
  title_ varchar(64)
  project_id uuid
change_type varchar(20) )
change_reason text)
  
}

Tracking Changes with a Trigger Function

The client.fn_log_interview_change() function is executed whenever a record in client.interview is inserted or updated. It captures the previous values and inserts them into client.interview_history. Here’s how it works:

  • If a new interview record is inserted, the function logs it as a CREATE operation.
  • If an existing interview record is updated, the function logs the previous state as an UPDATE operation.
  • The modified_date is set to the current timestamp, and the change_reason argument is recorded.

CREATE OR REPLACE FUNCTION client.fn_log_interview_change() 
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'UPDATE' THEN
    INSERT INTO client.interview_history (
      interview_id, role_id, request_date, model_id, cost, frequency_,
      interview_admin, approval_date, start_date, end_date, create_date,
      modified_date, created_by, modified_by, author_id, source_,
      title_, project_id, change_type, change_reason
    ) VALUES (
      OLD.interview_id, OLD.role_id, OLD.request_date, OLD.model_id, OLD.cost, OLD.frequency_,
      OLD.interview_admin, OLD.approval_date, OLD.start_date, OLD.end_date, OLD.create_date,
      CURRENT_TIMESTAMP, OLD.created_by, NEW.modified_by, OLD.author_id, OLD.source_,
      OLD.title_, OLD.project_id, 'UPDATE', TG_ARGV[0]
    );
  ELSIF TG_OP = 'INSERT' THEN
    INSERT INTO client.interview_history (
      interview_id, role_id, request_date, model_id, cost, frequency_,
      interview_admin, approval_date, start_date, end_date, create_date,
      modified_date, created_by, modified_by, author_id, source_,
      title_, project_id, change_type, change_reason
    ) VALUES (
      NEW.interview_id, NEW.role_id, NEW.request_date, NEW.model_id, NEW.cost, NEW.frequency_,
      NEW.interview_admin, NEW.approval_date, NEW.start_date, NEW.end_date, NEW.create_date,
      CURRENT_TIMESTAMP, NEW.created_by, NEW.created_by, NEW.author_id, NEW.source_,
      NEW.title_, NEW.project_id, 'CREATE', TG_ARGV[0]
    );
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

tr_interview_history Trigger

The tr_interview_history trigger ensures that client.fn_log_interview_change() runs automatically:

CREATE TRIGGER tr_interview_history
AFTER INSERT OR UPDATE ON client.interview
FOR EACH ROW EXECUTE FUNCTION client.fn_log_interview_change('');

This means that every modification to client.interview is recorded without requiring additional manual intervention.

Updating Interviews with client.fn_update_interview()

The client.fn_update_interview() function allows for updating interview records dynamically using JSONB input. This approach provides flexibility by allowing partial updates without requiring all column values.

  1. The function takes the following parameters:
    • interview_id_param: The ID of the interview to update.
    • updated_values: A JSONB object containing the new values.
    • modified_by_param: The user making the modification.
    • change_reason_param: An optional reason for the update.
  2. The function updates only the provided fields using COALESCE(), ensuring that existing values remain unchanged if not included in updated_values.
  3. The modified_date is automatically set to the current timestamp.
  4. The function returns TRUE if an update was performed, ensuring that the calling application can verify the change.

CREATE OR REPLACE FUNCTION client.fn_update_interview(
	interview_id_param uuid,
	updated_values jsonb,
	modified_by_param character varying,
	change_reason_param text DEFAULT NULL::text)
    RETURNS boolean
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE PARALLEL UNSAFE
AS $BODY$
BEGIN
  -- Perform update using the jsonb input
  UPDATE client.interview
  SET 
    role_id = COALESCE((updated_values->>'role_id')::uuid, role_id),
	frequency_ = COALESCE(updated_values->>'frequency_', frequency_),
	request_date = COALESCE((updated_values->>'request_date')::date, request_date),
	 approval_date = COALESCE((updated_values->>'approval_date')::date, approval_date),
	 start_date = COALESCE((updated_values->>'start_date')::date, start_date),
    end_date = COALESCE((updated_values->>'end_date')::date, end_date),  
	modified_date = CURRENT_TIMESTAMP
  WHERE interview_id = interview_id_param;
  RETURN FOUND;
END;
$BODY$;

Final Thoughts

The benefits of data design built around an insert-only strategy cannot be overemphasized. It truly creates a faster, better, and cheaper data solution. While it is difficult to avoid updates, handling updates in the way described here is by far the best approach.

Also, as mentioned recently, my company, kDS LLC, is actively seeking organizations interested in adopting a beta version of the kDS Data Source Discovery App. If your organization is interested, please let us know. We’d love to collaborate. Tell us in the comments below or email us at info@keeshinds.com

As always, thanks for stopping by!

Leave a Comment: